Optimal ratio for data splitting

نویسندگان

چکیده

It is common to split a dataset into training and testing sets before fitting statistical or machine learning model. However, there no clear guidance on how much data should be used for testing. In this article we show that the optimal splitting ratio $\sqrt{p}:1$, where $p$ number of parameters in linear regression model explains well.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Pilot-to-Data Power Ratio for Massive SIMO Systems

We consider a single cell single input multiple output (SIMO) system employing orthogonal frequency division multiplexing (OFDM). In such systems, setting the pilot-to-data power ratio (PDPR) has a large impact on the spectral and energy efficiency. In this paper we provide a closed form solution for the mean square error (MSE) of the received data as a function of the PDPR assuming Gaussian ch...

متن کامل

Determination of Optimal Sampling Design for Spatial Data Analysis

Extended Abstract. Inferences for spatial data are affected substantially by the spatial configuration of the network of sites where measurements are taken. Consider the following standard data-model framework for spatial data. Suppose a continuous, spatially-varying quantity, Z, is to be observed at a predetermined number, n, of points ....[ To Countinue Click here]

متن کامل

An Optimal Model for Medicine Preparation Using Data Mining

Introduction: Lack of financial resources and liquidity are the main problems of hospitals. Pharmacies are one of the sectors that affect the turnover of hospitals and due to lack of forecast for the use and supply of medicines, at the end of the year, encounter over-inventory, large volumes of expired medicines, and sometimes shortage of medicines. Therefore, medicine prediction using availabl...

متن کامل

On Optimal Node Splitting for R-trees

The problem of finding an optimal bipartition of a rectangle set has a direct impact on query performance of dynamic R-trees. During update operations, overflowed nodes need to be split (bipartitioned) with the goal of minimizing resultant expected query time. The previous algorithm for optimal node splitting requires exponential time. One contribution of this paper is a polynomial time algorit...

متن کامل

An extended of multiple criteria data envelopment analysis models for ratio data

One of the problems of the data envelopment analysis traditional models in the multiple form that is the weights corresponding to certain inputs and outputs are considered zero in the calculation of efficiency and this means that not all input and output components are utilized for the evaluation of efficiency, as some are ignored. The above issue causes the efficiency score of the under evalua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Statistical Analysis and Data Mining

سال: 2022

ISSN: ['1932-1864', '1932-1872']

DOI: https://doi.org/10.1002/sam.11583